Search results for "DNA sequence classificatio"

showing 4 items of 4 documents

A new feature selection strategy for K-mers sequence representation

2014

DNA sequence decomposition into k-mers (substrings of length k) and their frequency counting, defines a mapping of a sequence into a numerical space by a numerical feature vector of fixed length. This simple process allows to compute sequence comparison in an alignment free way, using common similarities and distance functions on the numerical codomain of the mapping. The most common used decomposition uses all the substrings of length k making the codomain of exponential dimension. This obviously can affect the time complexity of the similarity computation, and in general of the machine learning algorithm used for the purpose of sequence classification. Moreover, the presence of possible n…

Settore INF/01 - Informaticak-mers DNA sequence similarity feature selection DNA sequence classification

researchProduct

A New Feature Selection Methodology for K-mers Representation of DNA Sequences

2015

DNA sequence decomposition into k-mers and their frequency counting, defines a mapping of a sequence into a numerical space by a numerical feature vector of fixed length. This simple process allows to compare sequences in an alignment free way, using common similarities and distance functions on the numerical codomain of the mapping. The most common used decomposition uses all the substrings of a fixed length k making the codomain of exponential dimension. This obviously can affect the time complexity of the similarity computation, and in general of the machine learning algorithm used for the purpose of sequence analysis. Moreover, the presence of possible noisy features can also affect the…

k-mers DNA sequence similarity feature selection DNA sequence classification.Settore INF/01 - InformaticaComputer scienceSequence analysisbusiness.industryFeature vectorPattern recognitionFeature selectionDNA sequencingSubstringExponential functionArtificial intelligencebusinessAlgorithmTime complexity

researchProduct

Deep Learning Architectures for DNA Sequence Classification

2016

DNA sequence classification is a key task in a generic computational framework for biomedical data analysis, and in recent years several machine learning technique have been adopted to successful accomplish with this task. Anyway, the main difficulty behind the problem remains the feature selection process. Sequences do not have explicit features, and the commonly used representations introduce the main drawback of the high dimensionality. For sure, machine learning method devoted to supervised classification tasks are strongly dependent on the feature extraction step, and in order to build a good representation it is necessary to recognize and measure meaningful details of the items to cla…

DNA sequence classificatio Convolutional Neural Networks Recurrent Neural Networks Deep learning networksSettore INF/01 - Informatica

researchProduct

Alignment free Dissimilarities for sequence classification

2015

One way to represent a DNA sequence is to break it down into substrings of length L, called L-tuples, and count the occurence of each L-tuple in the sequence. This representation defines a mapping of a sequence into a numerical space by a numerical feature vector of fixed length, that allows to measure sequence similarity in an alignment free way simply using disssimilarity functions between vectors. This work presents a benchmark study of 4 alignment free disssimilarity functions between sequences, computed on their L-tuples representation, for the purpose of sequence classification. In our experiments, we have tested the classes of geometric-based, correlation-based and information-based …

Settore INF/01 - Informaticak-mers L-tuples DNA sequence similarity DNA sequence classification Knn classifier

researchProduct